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Abstract 

The joint cumulative distribution function for order statistics aris- 
ing from several different populations is given in terms of the distri- 
bution function of the populations. The computational cost of the 
formula in the case of two populations is still exponential in the worst 
case, but it is a dramatic improvement compared to the general for- 
mula by Bapat and Beg. In the case when only the joint distribution 
function of a subset of the order statistics of fixed size is needed, the 
complexity is polynomial, for the case of two populations. 

Keywords: block matrix, computational complexity, multiple comparison. 
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1 INTRODUCTION 



The iBenjamini and Hochber procedure represents one of what has 

become a rather large class of techniques in which we would like to be able to 
calculate order statistics arising from several populations. The complexity 
of the calculations implied by such approaches has remained a barrier to 
accurate probability statements. We provide tools which greatly extend the 
range of computable cases. 

Order statistics obtained by sampling from two different populations oc- 
cur, e.g., when p- values arise from null or alternative hypotheses, from men 
or women, or from two different types of cancer. 

The distribution of order statistics for independent, identically distributed 
random variables is well known, and appears in every ba sic statistics book; for 
examp le, iHogg and Craia (119781. C h apter 4, Section 6). iDavid and Nagaraja 
( 20031 ) and iBalakrishnan and Rad ( 19981 ) provide a thorough review of or- 
der statistics. For identically distributed random variables, the cumulative 
distribution function is concise and fast to compute. 

For independent, but not identically distributed random variables, a for- 
mula for computing t he joint cumulat i ve di stribution function of the order 
statistics was given by lBapat and Begl (119891 ). However, this formula is com- 
putationally intractable, because it involves an exponential number of per- 
manents of the size of the number of random variables. In addition, the com- 
plexity of the computation of the permanent by the best algorithms grows 
exponentially (IKnuthl. Il998l. p . 499). Approximate algo r ithms for cornputing 
the permanent (jValiantl . 1 19791 : iForbert and Marxl . l2003l : iJerrum et al.l . |200J) 
with lower asymptotic complexity are still not practical. 

We show that the computational cost of the formula in the case of two 
populations is still exponential, but is a dramatic improvement compared 
to the general formula by Bapat and Beg. In the case when only the joint 
distribution function of a subset of the order statistics of fixed size is needed, 
we show that the complexity is polynomial, in the case of two populations. 
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2 NOTATION AND PRELIMINARIES 



For an m x m matrix A, with entries a^, the permanent is given by lAitken 



(119391 . p. 30) 

m 

per [-4] = ^ JJ ai,^(i) . (1) 

TT i=l 

where vr ranges over all permutations of {1,2, . . . ,m}. Hence, the perma- 
nent is defined much like the determinant, but with all signs positive. The 
permanent can be expanded by row or columns exactly like the determinant. 
The computational cost of evaluating the permanent by expansion is 0(m!) 
operations. T he computational cost using the best algorithms is exponential 



Knuth] (119981 . p. 499). 

The following notation will be used in all theorems and proofs in this 
paper without further explicit reference. Xi, i = 1, . . . ,m are independent 
real valued random variables with cumulative distribution functions Fi (x). 
The order statistics Yi, 1^2, • • • , are random variables defined by sorting the 
values of X^. In particular, Yi <Y2 < . . . < Ym- The arguments of the joint 
cumulative distribution function of order statistics are customarily written 
omitting redundant arguments; thus let n , 1 < ni < ^2 < • ■ ■ < < 
denote the indices of the remaining arguments and ?/i < ?/2 < ■ ■ ■ < 1/fc their 
values. Finally, define the index vector i = {io, ii, ■ ■ ■ ik+i) and the summation 
index set 

J ^ f. . = io < «i < ■ ■ • < 4 < 4+1 = 1 
' and ij > rij for all 1 < j < A; J ' 

Writing summation over the set 1 in terms of loops is straightforward. Using 
the set X instead of the loop in this paper allows an insight into the structure 
of the method and its complexity, and it does not tie the mathematical 
formulation to any particular implementation. 

The joint cumulative distribution function of the set {l^m, Yn^, . . . , ^nj.}, 
which is a subset of the complete set of order statistics, is defined as 

i"y„„...y„, (l/i, . . . , l/fc) = Pr {{Y^, < yi) A {Y^, < ys) A ■ ■ ■ A {Y^^ <yk)}. 

(3) 

For two sequences am and 6^, let ~ &m denote limm^oo Qm/^m = 1- 
Let const be a generic positive constant independent of m; that is, const can 
have a different value every time it is used. Now a^n = O (bm) can be written 
as \am\ < const bm- 
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3 JOINT CUMULATIVE DISTRIBUTION 
FUNCTION OF ORDER STATISTICS 

First consider the distribution of the order statistics of a random sample 
where each sample member is taken from a possibly different population 
with its own distribution. 

Theorem 1 ( Bapat and Beg ( 19891 ). Theorem 4.2) The cumulative dis- 
tribution function of the order statistics satisfies 



.Vk) 



5Z r,- _ 



Ph,...,ik (2/1. • • -^Vk) 



{ii - io)\ (^2 - ^i)! ■ ■ ■ (ik+i - ikV-' 



(4) 



where 



Pii,...,ik (2/1' ■■■,yk) 



per 



(ij-'jj_i)xl 



j=k,i=m 
j=l,i=l 



(5) 



is the permanent of the block matrix with the block row index j and block 
column index i. The blocks have {ij — ij-i) rows, and 1 column each, which 
is denoted by the subscript {ij — ij-i) x 1. Each block has only one distinct 
entry, which is - Fi{yj_i)]. We take Fi (yo) = 0, Fi (yk+i) = 1- 



In expanded form, the permanent ^ can be written as 
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per 



Fi{yi) 



F2{yi) 



Fi{yk) - Fi{yk-i) F2{yk) - F^iyu-i) 

Fi{yk) - F^{yu-i) F^iy^) - F^iyu-i) 
[l-F,{y,)] \l-F2{y,)] 



[1 - Fi {yu)] 



[1 - F2 {yu)] 



FmiVl) 

Fm{yi) 

Fmiy2) - F^iyi) 
Fm{y2) - F^iyi) 

Fm{yk) - Fm{yk-i) 

FraiVk) - Fmiyk-l) 

[1 - F^ {yu)] 

[1 - Frr. (t/fc)] 



(6) 

where the j-th group, j = 1, . . . , k + 1, contains ij — ij-i repetitions of the 
same row. 



Proof. The theorem is stated, but not proved in lBapat and Begl (119891 ) . We 
provide a proof for the sake of completeness, and to prepare the ground for 
our result. 

Define yo = —00, and y^+i = 00. Note that for i G {1, 2, . . . , m}, Fi (?/o) = 
0, and Fi (yk+i) = 1, since the Fi are cumulative distribution functions. 
Denote A = Fy„^,...y„^ {yi, . . .,yk)- Then we have 

A = Fi < y,}^ = Pr {at least of X, < y,} j . (7) 

Denote by Ij the random variable equal to the number of Xj such that Xi < 
yj. Then /i < /2 < ■ ■ ■ < -^fc? and the condition that at least Uj of Xi < yj is 
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equivalent to /_,■ > uj. Thus, 

(k \ / 12 k \ 

r\{ij>nA]=p^[Or\{i^=h}]^ (8) 
i=i / \iGXi=i / 

and, since the events 0^=1 — ^'^^ different i are disjoint, 

iei \j=i / 

/k+i \ 

= 5Z I n {exactly ij - ij-i of Xi E {yj^i, yj]} J . (10) 
iei \j=i J 

Now fix i and write an arbitrary permutation of {1,2,..., m} as 

TT = (7ri,7r2, . . . ,7rfc,7rfc+i) , (11) 

where each subsequence nj has exactly ij — terms. We will use {nj} to 
denote the set of the terms. Then, 

37rVj G {1, 2, . . . , A; + 1} : exactly ij - ij^i of Xi e {yj-i, yj] (12) 
^ 37rVj e {1, 2, . . . , A; + 1} : e {tt,} : X, G {y,.u (13) 



Hence, 



/fc+i \ 
Pr I Pi {exactly ij - ij^i of Xi G {yj-i,yj]} j (14) 

_ E.pr (n-;;n.eK} ^ (y,-i,%]} 



(2l - lo)\ ■ ■ ■ [lk+1 - ifcj! 

E.n n [i^. (y,) - (z/.-i)] 

{h - ioV- ■ ■ ■ {ik+i - ikV- 



(15) 



(16) 



because the events in the intersection are independent: there is one event for 
each Xi, which are independent random variables. Substituting into Q) and 
comparing with the definition of the permanent ([T]) concludes the proof. ■ 
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As noted in the introduction, using a general algorithm for permanents is 
prohibitively expensive. Given simplifying assumptions, however, the prob- 
lem becomes easier. In the case when the variables Xi, X2, . . . ,Xm are in- 
dependent and identically distributed (that is, the classical case of sampling 
from a single population), Theore m [1] reduces to the following well-known 
result (jPavid and Nagarajal . l2003l . p. 11). 

Theorem 2 Suppose that Fi = F for all i. Then the joint cumulative dis- 
tribution function of the order statistics satisfies 



fe+i 



iex j=i 



[F (y,) - F (y,_0] 



(17) 



Now consider drawing a random sample from two populations, each with 
a different cumulative distribution function, say F{x), and G{x). Sample 
the first n random variables from the first population with the distribution 
function F, and then m — n from the second populati on with the distributio n 
function G. Then the permanents from Equation H] ( Bapat and Beg ( 19891 )) 
simplify to the block form with constant blocks. 



per 



(2/1, • • • ,1/fc) 

[F(yi) -F(yo)](,^_,„)x„ 

[^(?/2) -^(l/l)]fe-n)xn 



fc+l-«fcjX" 



[G{yi) - G{yo)] 

{ii — jq) X (m—n) 
[G(|/2) -G(l/i)](^^_^^)^(„_„) 



(18) 



where the subscripts indicate the dimensions of blocks created by the repe- 
tition of the term in the brackets, and we take 



F (2/0) = G (yo) = 0, F (yfc+i) = G (y,+i) = 1. 
In expanded form, the permanent f|T8|) can be written as 



(19) 
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per 



F{yi) ■■■ F{y^) G{y,) •■■ G{y,) 

F{yi) ■■■ F{y,) G{y,) ■■■ G{y^) 

F{y,)-F{y,) ■■■ F{y^) - F{y,) G{y2) - G{y,) ■■■ Giy,) - G{y,) 

F{y,)-F{:y,) ■■■ F{y^) - F{y,) G{y2) - G{yr) ■•• Giy^) ^ G{y,) 

F{yk)-F{yu-i) ■■■ F{yk) - F{yu-i) G{yu) - G{yk-i) ■■■ G{yk) - G{yk-i) 

F{yk) - F{yk-i) ■■■ F{yj,) ~ F{yk_,) G^y^) - G{yk-i) ■■■ G{yk) - G{y^^^) 

l-F{yu) ■■■ l-F{yk) I - G {yu) ■■■ I - G {y^) 

l-F{yu) ■■■ l-F{yu) I - G {yu) ■■■ I - G {y^) 

(20) 

This special form of the permanent allows us to evaluate the joint distri- 
bution of the order statistic more efficiently. 

Theorem 3 Suppose that Fi (x) = F {x), for all 1 < i < n, and Fi {x) = 
G (x), for all n + 1 <i <m. Then 



Fy„„...y„^ {yi,...,yk) = 

^^^^ UU.-i.i- \,)\ 
iei \ j=i ^ ^ ^ ^ ^ ^' 

■ [F (%•) - F [G (y,) - G bj^-i}^-''''-'' , (21) 

where A = (Ai, A2, . . . , A^+i) ranges over all integer vectors such that 

Ai + A2 + ■ ■ ■ + Afc+i = n, 0<Xj< ij - ij-i. (22) 

Proof. We evaluate the permanents Pii,...,i^^ {yi, ■ ■ ■ ,yk) from f|T8|) . Let 
Si = {l,2,...,n} and S2 = {n + l,n + 2, . . . ,m}. Write a permutation 
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Interval 


(-oo,?/i] 


(2/1,2/2] 


iVk, 00) 


Total 




Ai 


A2 


Afc+i 


n 




ii - Ai 


12-11- \2 ■ 


■ m-ik- Afc+i 


m — n 


Total 


k 


12 -ii 


m-ik 


m 



Table 1: Total number of order statistics in each interval, and number from 
population 1 and 2 in each interval. 



of {1, 2, ... , m} as 7r = (vTi, 772, . . . , tt/c, T^k+i)-, where each subsequence tTj has 
exactly ij — ij^i terms. The subsequence Hj is a list of the subscripts of the 
random variables that fall in the interval {yj~i,yj)- Then the term in the 
definition of the permanent ([T]) associated with vr is 

m k+1 

n ^^M^) = n (y^) - ^ (^^-i)]'^' (y^) - G (y^-^f'"'-''"' ^ (23) 
i=i j=i 

where Xj is the number of random variables with subscripts listed in {ttj} that 
are in Si. For illustration, the intervals and the number of order statistics of 
each type in them are shown in Table [H 

The number of permutations vr such that Xj is the number of the elements 
from {71 j} that are in 5*1 is found as the product ABC, where 

A = -TZ^, (24) 

is the number of ways to distribute the n elements of Si so that set j has Xj 
elements (the multinomial coefficient). 



Ujli ik - h-i - ^i)' 



B = TmT. : TTT (25) 



is the number of ways to distribute the m — n elements of Si so that set j 
has ij — ij-i — Xj elements, and 

k+1 

C = l[{^,-^,^l)\ (26) 
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is the number of permutations that do not change the distribution of the 
elements 5*1 and 5*2 into those sets. Thus, 

m 

Pii,...,h • • • > 2/fc) = X] n "^''^w 

TT 1 = 1 

k+1 ,. ■ \\ 

• \F (%•) - F \G (y,) - G {y,-,)r-'^-^-'^ , (27) 



with the sum over all A that satisfy (l22i) . The result now follows from 
Theorem [H ■ 

The proof of Theorem [3] easily carries over to the general case of order 
statistics of a sample selected from an arbitrary number of populations. The 
proof of the next theorem can therefore be omitted. 

Theorem 4 Suppose that Fi = Gi for the first nii indices i, Fi = G2 for the 
next m2 indices i, etc., and Fi = Gn for the last indices i, with 

mi + ■ ■ ■ + rriN = m, nig > for all s. (28) 

Then 

FY^„...Y^Jyu...,yk)= (29) 

k+l N , 

= E E n n ?! ^y^^ - ^y^-'^^'^^ 

iGl [A,,] i=l s=l 

where the summation is over all integer matrices size k + l by N such 
that 



^js > for all j and all s, (31) 



fc+i 



Xjs = m for all s, (32) 

N 

= ij - ij-i for all j, (33) 
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and we take Gg (?/o) = 0, Gg (Vk+i) = 1- 



Theorem H] covers all of the theorems above. In the particular case when 
all rrii = 1, i.e., every distribution is different because it comes from a dif- 
ferent population, it gives exactly the same result as Theorem [H With two 
populations, the complexity of Theorem H] reduces to the complexity of The- 
orem j3l_Ilie_compkxity_o Theorem [3] is less than that of the Theorem [1] 
from lBapat and Beg (119891 ). as discussed in the next section. 



4 COMPLEXITY 



We wi ll now compare the relative complexity of Theorem^ from lBapat and Beg 
(1l989l ). and our formula. Theorem [31 We assume that the evaluation of the 
cumulative distribution function of each of the statistics takes a constant 
number of operations. 

For 1 < rii < n2 < ■ ■ ■ < rik < m, denote the number of elements of the 
index set I by 



«fc="fc «fe-i='T'fc-l n=ni 



Theorem 5 The number v (rii, n2, ■ ■ ■ , n^; m) oj the Bapat-Beg permanents 
in Theorem U\ is bounded by 



u{l,2,...,k;m)={"'~^^] (l-__), (36) 



u (rii, ris, . . . , Uk, m) < z/ (1, 2, . . . , /c; m) < z/ (1, 2, . . . , m; m) = Cm, (35) 
where 

( w 

;y n 9 k -m\ = \ 

k J \ m+ 1^ 
and 

C^.^(^-).-J^^. (37) 

m+l\m/ [m+ly.ml 

Proof. The inequalities in fl35l) are obtained by taking the smallest numbers 
for ni,n2, ■ ■ ■ ,nk and the largest possible value for k, which both give the 
largest number of terms. We now prove that 

fm + k\ fm + k\ , , 

.(1.2.....Mn)= , -L-l 
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by induction over k. For k = 1, fl38l) follows from 

z/(l;m) = ^1 

and 



m 



ii=i 



m + 1 
1 



m + 1 
1 - 1 



(m + 1) — 1 = m. 



Now assume that 0381) holds for some k and we will show that 



m + k + 1\ fm + k + 1 



i^(l,2,...,/c + l,m) 



k + l 



k 



(39) 



(40) 



(41) 



From the definition fl51|) and the induction assumption (1551) . it follows that 

(42) 



i^(l,2,...,/c + l;m) = z/(l,2, ...,/c;Zfc+i) 

ife+i=fe+i 

z + A;\ f i + k 



i=k+l 



where we have used the identity 



E 

i=k+l 
m 

E 

+ 
? 

E 

i=k+l 



n — 1 
r 



k 

i + k + 1 
k + l 

i + k 
k 



n — 1 
r — 1 



k-1 



i + k 
k + l 

i + k + 1 
k 



(43) 
(44) 
(45) 



(46) 



twice. Both sums telescope, and we get 
1^(1,2, A; + l:m) = 



m + k + 1 

k + l 
m + k + 1 
k 



2k + 1 
k + l 
2k + 1 
k 



(47) 
(48) 
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which, noting that 

\k + lj {k + l)\k\ \ k J' ^ ^ 

gives pij) . Equations fl36|) and fl37|) follow from fl38|) by a direct computation: 
+ k\ fm + k\ m + km + k — 1 m + 2m+l 

(50) 
(51) 



k J \k-l J 1 2 k-1 k 

m + km + k — 1 m + 2 



and 



m + m\ /m + m\ /2m\ / m \ 1 /2m\ 
m J \m — 1 J \m J \ m + ij m+l\m/' 

which concludes the proof. ■ 

The n umbe rs Cm defined by fl371) are known as the Catalan numbers 



( jStanleyl . Il999l ). and the numbe rs ak^m = (1, 2, . . . , fc; m) are called the 



Catalan triangle (IShapird . Il976l ). From the Stirling approximation m! 



V^vrmm"^/e™', the growth of Catalan numbers is exponential, 

Cm ~ const > const a"', (54) 

for any 1 < a < 4 (with a different const for each a). 

Theorem 6 The worst case complexity of computing the distribution func- 
tion of the order statistics from Theorem U\ is 

const CmmK"' ~ const m-^/^A"'P{m), (55) 

where P{m) is the number of operations for computing permanent of order 
m. 

Proof. The denominator in (jl]) requires at most O (m) operations, and there 
are at most Cm terms in the sum by Theorem [51 ■ 
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It is known that the complexity of computing the permanent is bounded 

by 

P{m) = 0{m''2'^) 

for some a, e.g., from the Ryser's formula ( Knuthl . 1998 ). So, the complexity 



of the computation of the distribution function from Theorem[T]is exponential 
in m. Therefore, the computation is practical only for small m. 

Fortunately, a drastic reduction of complexity is possible in the case when 
the order statistics come from two populations. In fact, the complexity re- 
duces still farther when we need only a small number k of order statistics. 

Theorem 7 Let C {k, m, n) be the number of operations in Theorem to 
evaluate the joint distribution function of k order statistics from m random 
variables from two populations, with n < m of the variables from the first 
population. Then 

C(.,»,„)<con.U.('" + *)(";*)(l--A_). (56) 

In the worst case over all k and n, the complexity is bounded by 

(2m) ^ 

C {k, m, n) < const m ^ ~ const 16™", (57) 

(m!) 

For any fixed k, the complexity is bounded by 

C {k,m,n) = O {m'^n'') . (58) 
i.e., the complexity is polynomial in m. 

Proof. The complexity is bounded by const CLM, where C = C"^^) (l — ^) 
is the number of terms in the sum over i, L is the number of possible index 
vectors A satisfying (122|) . and M is the complexity of evaluating the products 
in one term of the sum, which is M = O (A;). To bound L, drop the upper 
bounds in fl22p . Thus L is bounded above by the number of all integer vectors 
A such that 

Ai + Aa + ■ ■ ■ + Afe+i = n, A^ > for all j, (59) 

which is the same as the number of ways to distribute n indistinguishable 
objects to /c + 1 distinguishable bins, which equals to ("^'^)- This gives (!56|l . 
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General method 




k 



3 



Figure 1: Times for evaluating the joint cumulative distribution function 
of the first k order statistics of m random variables from two distributions, 
using the general Bapat-Beg formula (Theorem [1]). 

The bound ( 1571) follows by taking a pessimistic value of k in each term 
( I56l) - twice k = m, then k = 0, and pessimistic value n = m. The second 
part of fl571) follows from the Stirling formula. 

The polynomial bound fl58|) follows from fl56|) and the inequality 



applied with p = m and p = n. ■ 

Although the complexity of evaluating the cumulative distribution func- 
tion of order statistics from Theorem [1] is exponential in the general case, we 
have shown in Theorem [7| that the complexity is bounded by a polynomial of 
a small degree when there are only two populations, and the number of order 
statistics considered, k, is fixed and small. The complexity also depends on 
n, the number of random variables from the first population, 5*1. In general, 
n is fixed by the state of nature. 

To confirm and illustrate the result, we have conducted a timing experi- 
ment. We calculated the joint distribution function in the case of two popula- 
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New method 




Figure 2: Times for evaluating the joint cumulative distribution function 
of the first k order statistics of m random variables from two distributions, 
using the new formula from Theorem [3l 



Bapat-Beg formula 
Theorem [1], Fig. [1] 


New formula 
Theorem [31 Fig. 


m 


Improvement 
Fig.E] 


]^g-2.9-0.36A:^2.0+l.lA: 


]^g-2.6-0.01fc^0.06 


4-1.02fc 


]^g-0.30-0.34A:^1.93+0.09fc 



Table 2: Fit of timing in Mathematica of the evaluation of the joint distribu- 
tion of the first k statistics of m variables from two populations (n=l from 
one population, m — n from the other). For fixed /c, regression was used to 
fit the logarithm of the time with a linear function of log m, and regression 
was then used again to fit the coefficients by linear functions of k. 
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Improvement 




Figure 3: Ratio of times for evaluating the joint cumulative distribution 
function of the first k order statistics of m random variables from two dis- 
tributions, using the Bapat-Beg formula (Theorem [1]) and the new formula 
from Theorem [31 
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tions. We considered k = 1, k = 2, and k = 3, and fixed n = 1. We measured 
the amount of time it took to compute tlie joint distribution function using 
the general Bapat Beg formula with permanents (Fig. [1]) and the new special 
formula (Fig. [2]). Both theorems were implemented in Mathematica . The 
permanents were computed in Mathematica using the code 

Permanent [A_List] :=With[v = Array[x, Length[A]], 

Coeff icient[Times@@(A.v),Times@(av] 



from IWeissteinI (120061 ). This function computes the permanent of matrix A 



by Vardi's formula as the coefficient of 

m 

(ttjiXi + ai2X2 + ■ ■ ■ + CLim^m) 5 

i=l 

using symbolic manipulation with automatic caching of partial results by 
the Mathematica kernel. Amazingly, calculating the permanent from (|T8l) in 
Mathematica results in times that grow polynomially with m, the number of 
rows in the permanent. Consequently, for two populations, while the theoret- 
ical complexity of Bapat Beg is exponential, the actual time observed while 
calculating the formulas in Mathematica was polynomial (Fig.[T]). Graphing 
the time versus the log of m produces almost straight lines in a log- log plot. 
We attribute this speedup to the reuse of partial results by the Mathematica 
kernel. 

Mathematica calculates the Bapat Beg formula more rapidly than pre- 
dicted. In the timing experiment, the observed times for the new formula 
(Theorem [3]) are much faster than the Bapat Beg formula. The observed 
improvement was quite dramatic (Fig. [3]). The observed improvement is of 
the order (Table [2]). The observed complexity of the new formula for two 
populations was of the order m^, which confirms the result of Theorem [7] for 
constant n = 1. 

All calculations were done using a custom New Tech Solutions workstation 
with 4 AMD Opteron 848 processors running Mathematica 5.2, under the 
SuSE Linux Enterprise Server 10 operating system. 

Mathematica code to calculate the cumulative distribution function for 
arbitrary collections of order statistics of independent random variables which 
may have different distributions is available free from the authors. Examples 
demonstrating the use of the software are also available from the authors. 
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